Search CORE

60 research outputs found

Cascaded Segmentation-Detection Networks for Word-Level Text Spotting

Author: Manduchi Roberto
Qin Siyang
Publication venue
Publication date: 03/04/2017
Field of study

We introduce an algorithm for word-level text spotting that is able to accurately and reliably determine the bounding regions of individual words of text "in the wild". Our system is formed by the cascade of two convolutional neural networks. The first network is fully convolutional and is in charge of detecting areas containing text. This results in a very reliable but possibly inaccurate segmentation of the input image. The second network (inspired by the popular YOLO architecture) analyzes each segment produced in the first stage, and predicts oriented rectangular regions containing individual words. No post-processing (e.g. text line grouping) is necessary. With execution time of 450 ms for a 1000-by-560 image on a Titan X GPU, our system achieves the highest score to date among published algorithms on the ICDAR 2015 Incidental Scene Text dataset benchmark.Comment: 7 pages, 8 figure

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Automatic Semantic Content Removal by Learning to Neglect

Author: Manduchi Roberto
Qin Siyang
Wei Jiahui
Publication venue
Publication date: 01/01/2018
Field of study

We introduce a new system for automatic image content removal and inpainting. Unlike traditional inpainting algorithms, which require advance knowledge of the region to be filled in, our system automatically detects the area to be removed and infilled. Region segmentation and inpainting are performed jointly in a single pass. In this way, potential segmentation errors are more naturally alleviated by the inpainting module. The system is implemented as an encoder-decoder architecture, with two decoder branches, one tasked with segmentation of the foreground region, the other with inpainting. The encoder and the two decoder branches are linked via neglect nodes, which guide the inpainting process in selecting which areas need reconstruction. The whole model is trained using a conditional GAN strategy. Comparative experiments show that our algorithm outperforms state-of-the-art inpainting techniques (which, unlike our system, do not segment the input image and thus must be aided by an external segmentation module.)Comment: Accepted to BMVC 2018 as an oral presentatio

arXiv.org e-Print Archive

eScholarship - University of California

Multiple Instance Curriculum Learning for Weakly Supervised Object Detection

Author: Huang Qin
Kuo C. -C. Jay
Li Siyang
Xu Hao
Zhu Xiangxin
Publication venue
Publication date: 01/01/2017
Field of study

When supervising an object detector with weakly labeled data, most existing approaches are prone to trapping in the discriminative object parts, e.g., finding the face of a cat instead of the full body, due to lacking the supervision on the extent of full objects. To address this challenge, we incorporate object segmentation into the detector training, which guides the model to correctly localize the full objects. We propose the multiple instance curriculum learning (MICL) method, which injects curriculum learning (CL) into the multiple instance learning (MIL) framework. The MICL method starts by automatically picking the easy training examples, where the extent of the segmentation masks agree with detection bounding boxes. The training set is gradually expanded to include harder examples to train strong detectors that handle complex images. The proposed MICL method with segmentation in the loop outperforms the state-of-the-art weakly supervised object detectors by a substantial margin on the PASCAL VOC datasets.Comment: Published in BMVC 201

arXiv.org e-Print Archive

Crossref

Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

Author: Bissacco Alessandro
Fujii Yasuhisa
Long Shangbang
Qin Siyang
Raptis Michalis
Publication venue
Publication date: 25/10/2023
Field of study

We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis. HTS can recognize text in an image and identify its 4-level hierarchical structure: characters, words, lines, and paragraphs. The proposed HTS is characterized by two novel components: (1) a Unified-Detector-Polygon (UDP) that produces Bezier Curve polygons of text lines and an affinity matrix for paragraph grouping between detected lines; (2) a Line-to-Character-to-Word (L2C2W) recognizer that splits lines into characters and further merges them back into words. HTS achieves state-of-the-art results on multiple word-level text spotting benchmark datasets as well as geometric layout analysis tasks.Comment: Accepted to WACV 202

arXiv.org e-Print Archive

Automatic skin and hair masking using fully convolutional networks

Author: Kim Seongdo
Manduchi Roberto
Qin Siyang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2017
Field of study

Crossref

eScholarship - University of California

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Author: Bissacco Alessandro
Fujii Yasuhisa
Long Shangbang
Panteleev Dmitry
Qin Siyang
Raptis Michalis
Publication venue
Publication date: 28/03/2022
Field of study

Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way. Comprehensive experiments show that our unified model achieves better performance than multiple well-designed baseline methods. Additionally, this model achieves state-of-the-art results on multiple scene text detection datasets without the need of complex post-processing. Dataset and code: https://github.com/google-research-datasets/hiertext.Comment: To appear at CVPR 202

arXiv.org e-Print Archive

Instance Embedding Transfer to Unsupervised Video Object Segmentation

Author: Fathi Alireza
Huang Qin
Kuo C. -C. Jay
Li Siyang
Seybold Bryan
Vorobyov Alexey
Publication venue
Publication date: 26/02/2018
Field of study

We propose a method for unsupervised video object segmentation by transferring the knowledge encapsulated in image-based instance embedding networks. The instance embedding network produces an embedding vector for each pixel that enables identifying all pixels belonging to the same object. Though trained on static images, the instance embeddings are stable over consecutive video frames, which allows us to link objects together over time. Thus, we adapt the instance networks trained on static images to video object segmentation and incorporate the embeddings with objectness and optical flow features, without model retraining or online fine-tuning. The proposed method outperforms state-of-the-art unsupervised segmentation methods in the DAVIS dataset and the FBMS dataset.Comment: To appear in CVPR 201

arXiv.org e-Print Archive

Crossref